<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
  <title>Description of v_estnoiseg</title>
  <meta name="keywords" content="v_estnoiseg">
  <meta name="description" content="V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)">
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  <meta name="generator" content="m2html &copy; 2003 Guillaume Flandin">
  <meta name="robots" content="index, follow">
  <link type="text/css" rel="stylesheet" href="../m2html.css">
</head>
<body>
<a name="_top"></a>
<div><a href="../index.html">Home</a> &gt;  <a href="index.html">v_mfiles</a> &gt; v_estnoiseg.m</div>

<!--<table width="100%"><tr><td align="left"><a href="../index.html"><img alt="<" border="0" src="../left.png">&nbsp;Master index</a></td>
<td align="right"><a href="index.html">Index for v_mfiles&nbsp;<img alt=">" border="0" src="../right.png"></a></td></tr></table>-->

<h1>v_estnoiseg
</h1>

<h2><a name="_name"></a>PURPOSE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)</strong></div>

<h2><a name="_synopsis"></a>SYNOPSIS <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>function [x,zo]=v_estnoiseg(yf,tz,pp) </strong></div>

<h2><a name="_description"></a>DESCRIPTION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre class="comment">V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)

 Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]
           ovf=2;                  % overlap factor
           f=v_rfft(v_enframe(s,hanning(ovf*ninc,'periodic'),ninc),ovf*ninc,2);
           f=f.*conj(f);           % convert to power spectrum
           x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum

 Inputs:
   yf      input power spectra (one row per frame)
   tz      frame increment in seconds
           Alternatively, the input state from a previous call (see below)
   pp      algorithm parameters [optional]

 Outputs:
   x       estimated noise power spectra (one row per frame)
   zo      output state

 The algorithm parameters are defined in reference [1] from which equation
 numbers are given in parentheses. They are as follows:

        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
        pp.pnsaf    % noise probability safety value [0.01] (24)
        pp.pspri    % prior speech probability [0.5] (18)
        pp.asnr     % active SNR in dB [15] (18)
        pp.psini    % initial speech probability [0.5] (23)
        pp.tavini   % assumed speech absent time at start [0.064 seconds]

 If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent:

                   (a) dp=v_estnoiseg(yp(1:300),tinc);

                   (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc);
                       [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z);
                       [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z);</pre></div>

<!-- crossreference -->
<h2><a name="_cross"></a>CROSS-REFERENCE INFORMATION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
This function calls:
<ul style="list-style-image:url(../matlabicon.gif)">
<li><a href="v_axisenlarge.html" class="code" title="function v_axisenlarge(f,h)">v_axisenlarge</a>	V_AXISENLARGE - enlarge the axes of a figure (f,h)</li></ul>
This function is called by:
<ul style="list-style-image:url(../matlabicon.gif)">
<li><a href="v_activlevg.html" class="code" title="function [lev,xx] = v_activlevg(sp,fs,mode)">v_activlevg</a>	V_ACTIVLEVG Measure active speech level robustly [LEV,AF,FSO]=(sp,FS,MODE)</li><li><a href="v_specsub.html" class="code" title="function [ss,gg,tt,ff,zo]=v_specsub(si,fsz,pp)">v_specsub</a>	V_SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)</li><li><a href="v_ssubmmse.html" class="code" title="function [ss,gg,tt,ff,zo]=v_ssubmmse(si,fsz,pp)">v_ssubmmse</a>	V_SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,PP)</li><li><a href="v_vadsohn.html" class="code" title="function [vs,zo]=v_vadsohn(si,fsz,m,pp)">v_vadsohn</a>	V_VADSOHN implements a voice activity detector [VS,ZO]=(S,FSZ,M,P)</li></ul>
<!-- crossreference -->


<h2><a name="_source"></a>SOURCE CODE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre>0001 <a name="_sub0" href="#_subfunctions" class="code">function [x,zo]=v_estnoiseg(yf,tz,pp)</a>
0002 <span class="comment">%V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)</span>
0003 <span class="comment">%</span>
0004 <span class="comment">% Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]</span>
0005 <span class="comment">%           ovf=2;                  % overlap factor</span>
0006 <span class="comment">%           f=v_rfft(v_enframe(s,hanning(ovf*ninc,'periodic'),ninc),ovf*ninc,2);</span>
0007 <span class="comment">%           f=f.*conj(f);           % convert to power spectrum</span>
0008 <span class="comment">%           x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum</span>
0009 <span class="comment">%</span>
0010 <span class="comment">% Inputs:</span>
0011 <span class="comment">%   yf      input power spectra (one row per frame)</span>
0012 <span class="comment">%   tz      frame increment in seconds</span>
0013 <span class="comment">%           Alternatively, the input state from a previous call (see below)</span>
0014 <span class="comment">%   pp      algorithm parameters [optional]</span>
0015 <span class="comment">%</span>
0016 <span class="comment">% Outputs:</span>
0017 <span class="comment">%   x       estimated noise power spectra (one row per frame)</span>
0018 <span class="comment">%   zo      output state</span>
0019 <span class="comment">%</span>
0020 <span class="comment">% The algorithm parameters are defined in reference [1] from which equation</span>
0021 <span class="comment">% numbers are given in parentheses. They are as follows:</span>
0022 <span class="comment">%</span>
0023 <span class="comment">%        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)</span>
0024 <span class="comment">%        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)</span>
0025 <span class="comment">%        pp.psthr    % threshold for smoothed speech probability [0.99] (24)</span>
0026 <span class="comment">%        pp.pnsaf    % noise probability safety value [0.01] (24)</span>
0027 <span class="comment">%        pp.pspri    % prior speech probability [0.5] (18)</span>
0028 <span class="comment">%        pp.asnr     % active SNR in dB [15] (18)</span>
0029 <span class="comment">%        pp.psini    % initial speech probability [0.5] (23)</span>
0030 <span class="comment">%        pp.tavini   % assumed speech absent time at start [0.064 seconds]</span>
0031 <span class="comment">%</span>
0032 <span class="comment">% If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent:</span>
0033 <span class="comment">%</span>
0034 <span class="comment">%                   (a) dp=v_estnoiseg(yp(1:300),tinc);</span>
0035 <span class="comment">%</span>
0036 <span class="comment">%                   (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc);</span>
0037 <span class="comment">%                       [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z);</span>
0038 <span class="comment">%                       [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z);</span>
0039 
0040 
0041 <span class="comment">% This is intended to be a precise implementation of [1] for a frame rate of 62.5 Hz.</span>
0042 <span class="comment">% Time constants are adjusted for other frame rates.</span>
0043 <span class="comment">%</span>
0044 <span class="comment">% Refs:</span>
0045 <span class="comment">%    [1] Gerkmann, T. &amp; Hendriks, R. C.</span>
0046 <span class="comment">%        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay</span>
0047 <span class="comment">%        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393</span>
0048 
0049 <span class="comment">%       Copyright (C) Mike Brookes 2012</span>
0050 <span class="comment">%      Version: $Id: v_estnoiseg.m 10865 2018-09-21 17:22:45Z dmb $</span>
0051 <span class="comment">%</span>
0052 <span class="comment">%   VOICEBOX is a MATLAB toolbox for speech processing.</span>
0053 <span class="comment">%   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html</span>
0054 <span class="comment">%</span>
0055 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0056 <span class="comment">%   This program is free software; you can redistribute it and/or modify</span>
0057 <span class="comment">%   it under the terms of the GNU General Public License as published by</span>
0058 <span class="comment">%   the Free Software Foundation; either version 2 of the License, or</span>
0059 <span class="comment">%   (at your option) any later version.</span>
0060 <span class="comment">%</span>
0061 <span class="comment">%   This program is distributed in the hope that it will be useful,</span>
0062 <span class="comment">%   but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
0063 <span class="comment">%   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the</span>
0064 <span class="comment">%   GNU General Public License for more details.</span>
0065 <span class="comment">%</span>
0066 <span class="comment">%   You can obtain a copy of the GNU General Public License from</span>
0067 <span class="comment">%   http://www.gnu.org/copyleft/gpl.html or by writing to</span>
0068 <span class="comment">%   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.</span>
0069 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0070 
0071 [nr,nrf]=size(yf);          <span class="comment">% number of frames and freq bins</span>
0072 x=zeros(nr,nrf);            <span class="comment">% initialize output arrays</span>
0073 <span class="keyword">if</span> isempty(yf) &amp;&amp; isstruct(tz)             <span class="comment">% no real data</span>
0074     zo=tz;                  <span class="comment">% just keep the same state</span>
0075 <span class="keyword">else</span>
0076     <span class="keyword">if</span> isstruct(tz)         <span class="comment">% take parameters from a previous call</span>
0077         nrcum=tz.nrcum;     <span class="comment">% cumulative number of frames</span>
0078         xt=tz.xt;           <span class="comment">% smoothed power spectrum</span>
0079         pslp=tz.pslp;       <span class="comment">% correction factor (9)</span>
0080         tinc=tz.tinc;       <span class="comment">% frame increment</span>
0081         qq=tz.qq;           <span class="comment">% parameter structure</span>
0082     <span class="keyword">else</span>
0083         tinc = tz;          <span class="comment">% second argument is frame increment</span>
0084         nrcum=0;            <span class="comment">% no frames so far</span>
0085         <span class="comment">% default algorithm constants</span>
0086         qq.tax=0.0717;      <span class="comment">% noise output smoothing time constant = -tinc/log(0.8) (8)</span>
0087         qq.tap=0.152;       <span class="comment">% speech prob smoothing time constant = -tinc/log(0.9) (23)</span>
0088         qq.psthr=0.99;      <span class="comment">% threshold for smoothed speech probability [0.99] (24)</span>
0089         qq.pnsaf=0.01;      <span class="comment">% noise probability safety value [0.01] (24)</span>
0090         qq.pspri=0.5;       <span class="comment">% prior speech probability [0.5] (18)</span>
0091         qq.asnr=15;         <span class="comment">% active SNR in dB [15] (18)</span>
0092         qq.psini=0.5;       <span class="comment">% initial speech probability [0.5] (23)</span>
0093         qq.tavini=0.064;        <span class="comment">% assumed speech absent time at start [64 ms]</span>
0094 
0095         <span class="keyword">if</span> nargin&gt;=3 &amp;&amp; ~isempty(pp)  <span class="comment">% update fields from pp input</span>
0096             qqn=fieldnames(qq);
0097             <span class="keyword">for</span> i=1:length(qqn)
0098                 <span class="keyword">if</span> isfield(pp,qqn{i})
0099                     qq.(qqn{i})=pp.(qqn{i});
0100                 <span class="keyword">end</span>
0101             <span class="keyword">end</span>
0102         <span class="keyword">end</span>
0103         pslp=repmat(qq.psini,1,nrf); <span class="comment">% initialize smoothed speech presence prob</span>
0104         xt=[];                       <span class="comment">% initialize just in case the first call has no data</span>
0105     <span class="keyword">end</span>
0106 
0107     <span class="comment">% unpack parameters needed within the loop</span>
0108 
0109     psthr=qq.psthr;     <span class="comment">% threshold for smoothed speech probability [0.99] (24)</span>
0110     pnsaf=qq.pnsaf;     <span class="comment">% noise probability safety value [0.01] (24)</span>
0111 
0112     <span class="comment">% derived algorithm constants</span>
0113 
0114     ax=exp(-tinc/qq.tax); <span class="comment">% noise output smoothing factor = 0.8 (8)</span>
0115     axc=1-ax;
0116     ap=exp(-tinc/qq.tap); <span class="comment">% noise output smoothing factor = 0.9 (23)</span>
0117     apc=1-ap;
0118     xih1=10^(qq.asnr/10); <span class="comment">% speech-present SNR</span>
0119     xih1r=1/(1+xih1)-1;
0120     pfac=(1/qq.pspri-1)*(1+xih1); <span class="comment">% p(noise)/p(speech) (18)</span>
0121 
0122     <span class="keyword">if</span> nrcum==0 &amp;&amp; nr&gt;0       <span class="comment">% initialize values for first frame</span>
0123         xt=qq.psini*mean(yf(1:max(1,min(nr,round(1+qq.tavini/tinc))),:),1);  <span class="comment">% initial noise estimate</span>
0124     <span class="keyword">end</span>
0125 
0126     <span class="comment">% loop for each frame</span>
0127     <span class="keyword">for</span> t=1:nr
0128         yft=yf(t,:);        <span class="comment">% noisy speech power spectrum</span>
0129         ph1y=(1+pfac*exp(xih1r*yft./xt)).^(-1); <span class="comment">% a-posteriori speech presence prob (18)</span>
0130         pslp=ap*pslp+apc*ph1y; <span class="comment">% smoothed speech presence prob (23)</span>
0131         ph1y=min(ph1y,1-pnsaf*(pslp&gt;psthr)); <span class="comment">% limit ph1y (24)</span>
0132         xtr=(1-ph1y).*yft+ph1y.*xt; <span class="comment">% estimated raw noise spectrum (22)</span>
0133         xt=ax*xt+axc*xtr;  <span class="comment">% smooth the noise estimate (8)</span>
0134         x(t,:)=xt;  <span class="comment">% save the noise estimate</span>
0135     <span class="keyword">end</span>
0136     <span class="keyword">if</span> nargout&gt;1    <span class="comment">% we need to store the state for next time</span>
0137         zo.nrcum=nrcum+nr;      <span class="comment">% number of frames so far</span>
0138         zo.xt=xt;          <span class="comment">% smoothed power spectrum</span>
0139         zo.pslp=pslp;               <span class="comment">% correction factor (9)</span>
0140         zo.tinc=tinc;     <span class="comment">% must be the last one</span>
0141         zo.qq=qq;
0142     <span class="keyword">end</span>
0143     <span class="keyword">if</span> ~nargout
0144         clf;
0145         subplot(212);
0146         plot((1:nr)*tinc,10*log10([sum(yf,2) sum(x,2)]))
0147         ylabel(<span class="string">'Frame Energy (dB)'</span>);
0148         xlabel(sprintf(<span class="string">'Time (s)   [%d ms frame incr]'</span>,round(tinc*1000)));
0149         <a href="v_axisenlarge.html" class="code" title="function v_axisenlarge(f,h)">v_axisenlarge</a>([-1 -1.05]);
0150         legend(<span class="string">'input'</span>,<span class="string">'noise'</span>,<span class="string">'Location'</span>,<span class="string">'Best'</span>);
0151         subplot(211);
0152         plot(1:nrf,10*log10([sum(yf,1)'/nr sum(x,1)'/nr]))
0153         ylabel(<span class="string">'Power (dB)'</span>);
0154         xlabel(<span class="string">'Frequency bin'</span>);
0155         <a href="v_axisenlarge.html" class="code" title="function v_axisenlarge(f,h)">v_axisenlarge</a>([-1 -1.05]);
0156         legend(<span class="string">'input'</span>,<span class="string">'noise'</span>,<span class="string">'Location'</span>,<span class="string">'Best'</span>);
0157     <span class="keyword">end</span>
0158 <span class="keyword">end</span>
0159</pre></div>
<hr><address>Generated by <strong><a href="http://www.artefact.tk/software/matlab/m2html/">m2html</a></strong> &copy; 2003</address>
</body>
</html>